Efficient and accurate methods for updating generalized linear models with multiple feature additions

نویسندگان

  • Amit Dhurandhar
  • Marek Petrik
چکیده

In this paper, we propose an approach for learning regression models efficiently in an environment where multiple features and data-points are added incrementally in a multistep process. At each step, any finite number of features maybe added and hence, the setting is not amenable to low rank updates. We show that our approach is not only efficient and optimal for ordinary least squares, weighted least squares, generalized least squares and ridge regression, but also more generally for generalized linear models and lasso regression that use iterated re-weighted least squares for maximum likelihood estimation. Our approach instantiated to linear settings has close relations to the partitioned matrix inversion mechanism based on Schur’s complement. For arbitrary regression methods, even a relaxation of the approach is no worse than using the model from the previous step or using a model that learns on the additional features and optimizes the residual of the model at the previous step. Such problems are commonplace in complex manufacturing operations consisting of hundreds of steps, where multiple measurements are taken at each step to monitor the quality of the final product. Accurately predicting if the finished product will meet specifications at each or, at least, important intermediate steps can be extremely useful in enhancing productivity. We further validate our claims through experiments on synthetic and real industrial data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Negative Binomial Distribution Efficiency in Finite Mixture of Semi-parametric Generalized Linear Models

Introduction Selection the appropriate statistical model for the response variable is one of the most important problem in the finite mixture of generalized linear models. One of the distributions which it has a problem in a finite mixture of semi-parametric generalized statistical models, is the Poisson distribution. In this paper, to overcome over dispersion and computational burden, finite ...

متن کامل

A comparative QSAR study of aryl-substituted isobenzofuran-1(3H)-ones inhibitors

A comparative workflow, including linear and non-linear QSAR models, was carried out to evaluate the predictive accuracy of models and predict the inhibition activity of a series of aryl-substituted isobenzofuran-1(3H)-ones. The data set consisted of 34 compounds was classified into the training and test sets, randomly. Molecular descriptors were selected using the genetic algorithm (GA) as a f...

متن کامل

Which Methodology is Better for Combining Linear and Nonlinear Models for Time Series Forecasting?

Both theoretical and empirical findings have suggested that combining different models can be an effective way to improve the predictive performance of each individual model. It is especially occurred when the models in the ensemble are quite different. Hybrid techniques that decompose a time series into its linear and nonlinear components are one of the most important kinds of the hybrid model...

متن کامل

Modeling of temperature in friction stir welding of duplex stainless steel using multivariate lagrangian methods, linear extrapolation and multiple linear regression

In this study, the temperature in friction stir welding of duplex stainless steel has been investigated. At first, temperature estimation was modeled and estimated at different distances from the center of the stir zone by the multivariate Lagrangian function. Then, the linear extrapolation method and multiple linear regression method were used to estimate the temperature outside the range and ...

متن کامل

Modeling of temperature in friction stir welding of duplex stainless steel using multivariate lagrangian methods, linear extrapolation and multiple linear regression

In this study, the temperature in friction stir welding of duplex stainless steel has been investigated. At first, temperature estimation was modeled and estimated at different distances from the center of the stir zone by the multivariate Lagrangian function. Then, the linear extrapolation method and multiple linear regression method were used to estimate the temperature outside the range and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2014